Gradual Machine Learning for Entity Resolution

نویسندگان

چکیده

Usually considered as a classification problem, entity resolution (ER) can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built variety learning models (most notably deep neural networks), which require lots accurately labeled training data. Unfortunately, high-quality usually expensive manual work, and are therefore not readily available in many scenarios. In this paper, we propose novel paradigm ER, called gradual machine learning , aims enable effective labeling without requirement effort. It begins with some easy instances task, automatically by high accuracy, then gradually labels more iterative factor graph inference. gradual learning, hard task small stages based estimated evidential certainty provided easier instances. Our extensive experiments have shown that performance proposed approach is considerably better than its unsupervised alternatives, highly competitive compared supervised techniques. Using test case, demonstrate promising potentially applicable other tasks requiring

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this...

متن کامل

Machine Learning for Anaphora Resolution

We compare the performance of the Kennedy and Boguraev [12] anaphora resolution algorithm to the performance of a memory-based machine learning algorithm. For each pronoun, the machine learning algorithm is given the Kennedy and Boguraev feature set for a list of candidate antecedents, but it does not have access to the salience weights for the features. Nevertheless, a statistical analysis sho...

متن کامل

Machine Learning for Entity Coreference Resolution: A Retrospective Look at Two Decades of Research

Though extensively investigated since the 1960s, entity coreference resolution, a core task in natural language understanding, is far from being solved. Nevertheless, significant progress has been made on learning-based coreference research since its inception two decades ago. This paper provides an overview of the major milestones made in learningbased coreference research and discusses a hard...

متن کامل

Entity Resolution and Federated Learning get a Federated Resolution

Consider two data providers, each maintaining records of different feature sets about common entities. They aim to learn a linear model over the whole set of features. This problem of federated learning over vertically partitioned data includes a crucial upstream issue: entity resolution, i.e. finding the correspondence between the rows of the datasets. It is well known that entity resolution, ...

متن کامل

A Machine Learning approach to Generic Entity Resolution in support of Cyber Situation Awareness

This paper introduces the Generic Entity Resolution (GER) framework; a framework that classifies pairs of entities as matching or non-matching based on the entities’ features and their semantic relationships with other entities. The GER framework has been developed as part of an AI-based system for the development of Cyber situational awareness and provides a data fusion role by resolving entit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2022

ISSN: ['1558-2191', '1041-4347', '2326-3865']

DOI: https://doi.org/10.1109/tkde.2020.3006142